pacman::p_load(tidyverse, ggplot2, tsibble, tsibbledata,
fable, fabletools, feasts,
plotly, DT, fable.prophet)Take-home Exercise 4 updated for Project
Prototyping Modules for Visual Analytics Shiny Application
1 Overview
In this take-home exercise, we are required to select one of the module of our proposed Shiny application and complete the following tasks:
- To evaluate and determine the necessary R packages needed for your Shiny application are supported in R CRAN,
- To prepare and test the specific R codes can be run and returned the correct output as expected,
- To determine the parameters and outputs that will be exposed on the Shiny applications, and
- To select the appropriate Shiny UI components for exposing the parameters determine above.
The module that we are focusing on is the univariate forecasting of the data.
2 Loading R packages
First and foremost, we will start by loading the R packages required.
3 Data preparation
As this take home exercise serves as analysis of one of the module of the project, the data preparation steps is only performed once to ensure that same data set (weather_imputed_11stations.rds) are used across all members. Detailed preparation steps can be found in this link.
3.1 Importing the data
We import the processed data set using read_rds(),
weather <- read_rds("data/weather_imputed_11stations.rds")head(weather)# A tibble: 6 × 15
Station Date Year Month Day `Daily Rainfall Total (mm)`
<chr> <date> <dbl> <dbl> <dbl> <dbl>
1 Ang Mo Kio 2021-01-01 2021 1 1 94.4
2 Ang Mo Kio 2021-01-02 2021 1 2 114.
3 Ang Mo Kio 2021-01-03 2021 1 3 5.2
4 Ang Mo Kio 2021-01-04 2021 1 4 0
5 Ang Mo Kio 2021-01-05 2021 1 5 0
6 Ang Mo Kio 2021-01-06 2021 1 6 0
# ℹ 9 more variables: `Mean Temperature (°C)` <dbl>,
# `Maximum Temperature (°C)` <dbl>, `Minimum Temperature (°C)` <dbl>,
# LAT <dbl>, LONG <dbl>, Date_mine <date>, Month_Name <fct>, Week <dbl>,
# Weekday <dbl>
3.2 Convert to tsibble data frame
The dataset comprises daily records of rainfall and temperature gathered from various weather stations across Singapore. Initially, this data is loaded in a ‘tibble’ format. However, for our forecasting analysis, we will be using fable package, which is part of the tidyverts ecosystem and requires data in a ‘tsibble’ format. To accommodate this requirement, we will convert our tibble dataframe into a tsibble using the following code.
weather_tsbl <- as_tsibble(weather, key = Station, index = Date)
weather_tsbl# A tsibble: 12,045 x 15 [1D]
# Key: Station [11]
Station Date Year Month Day `Daily Rainfall Total (mm)`
<chr> <date> <dbl> <dbl> <dbl> <dbl>
1 Ang Mo Kio 2021-01-01 2021 1 1 94.4
2 Ang Mo Kio 2021-01-02 2021 1 2 114.
3 Ang Mo Kio 2021-01-03 2021 1 3 5.2
4 Ang Mo Kio 2021-01-04 2021 1 4 0
5 Ang Mo Kio 2021-01-05 2021 1 5 0
6 Ang Mo Kio 2021-01-06 2021 1 6 0
7 Ang Mo Kio 2021-01-07 2021 1 7 1.6
8 Ang Mo Kio 2021-01-08 2021 1 8 12.6
9 Ang Mo Kio 2021-01-09 2021 1 9 35.4
10 Ang Mo Kio 2021-01-10 2021 1 10 119.
# ℹ 12,035 more rows
# ℹ 9 more variables: `Mean Temperature (°C)` <dbl>,
# `Maximum Temperature (°C)` <dbl>, `Minimum Temperature (°C)` <dbl>,
# LAT <dbl>, LONG <dbl>, Date_mine <date>, Month_Name <fct>, Week <dbl>,
# Weekday <dbl>
The list below shows 11 unique stations.
Station <- weather_tsbl %>% distinct(Station)
datatable(Station,
class= "compact",
rownames = FALSE,
width="100%",
options = list(pageLength = 12,scrollX=T))What is tsibble?
A tsibble consists of a time index, key, and other measured variables in a data-centric format. In tsibble,
- Index is a variable with inherent ordering from past to present.
- Key is a set of variables that define observational units over time.
- Each observation should be uniquely identified by index and key.
- Each observational unit should be measured at a common interval, if regularly spaced.
4 Section 1: Time Series Exploration
In this section, users can interactively explore time series data using a line graph equipped with a time slider. They have the option to select specific stations, variables, and time periods for analysis.
# user defined parameter
selected_stations <- unique(weather_tsbl$Station)
variable_temp <- "Minimum Temperature (°C)"
variable_rain <- "Daily Rainfall Total (mm)"
start_date <- "2021-01-01"
end_date <- "2023-12-31"# Filter data based on period
filtered_period <- weather_tsbl %>%
filter_index(start_date ~ end_date)
# Filter period-filtered data based on multiple selected stations
filtered_period_mstn <- filtered_period %>%
filter(Station %in% selected_stations)
# Generate the plot based on filtered data
plot_ly(filtered_period_mstn,
x = ~Date,
y = as.formula(paste0("~`", variable_temp, "`")),
type = 'scatter',
mode = 'lines',
color = ~Station,
hoverinfo = 'text',
text = ~paste("<b>Station:</b>", Station,
"<br><b>Date:</b>", Date,
"<br><b>", variable_temp, ":</b>", filtered_period_mstn[[variable_temp]])) %>%
layout(title = paste(variable_temp, "by Station"),
xaxis = list(title = ""),
yaxis = list(title = "")) %>%
# Display the plot
layout(xaxis = list(rangeslider = list(type = "date"))) The code chunk below plot line graph for weekly summarise view of the following variables: “Mean Temperature (°C)”, “Minimum Temperature (°C)”, “Maximum Temperature (°C)”
# Summarise period-filtered temperature data based on weekly average
summarised_temp_week <- filtered_period %>%
group_by_key() %>%
index_by(year_week = ~ yearweek(.)) %>%
summarise(
avg_temp = round(mean(.data[[variable_temp]]),2), na.rm = TRUE)
# Filter summarised temperature data based on multiple selected stations
summarised_temp_mstn <- summarised_temp_week %>%
filter(Station %in% selected_stations) # Convert the year_week result to the first day of the week for plotting
summarised_temp_mstn <- summarised_temp_mstn %>%
mutate(week_start_date = floor_date(as.Date(year_week), unit = "week"))
# Generate the plot based on filtered data
plot_ly(summarised_temp_mstn, x = ~week_start_date, y = ~avg_temp,
type = 'scatter', mode = 'lines', color = ~Station,
hoverinfo = 'text',
text = ~paste("<b>Station:</b>", Station,
"<br><b>Week Starting:</b>", week_start_date,
"<br><b>Average", variable_temp, ":</b>", avg_temp)) %>%
layout(title = paste("Weekly Average", variable_temp, "by Station"),
xaxis = list(title = ""),
yaxis = list(title = "")) %>%
# Display the plot
layout(xaxis = list(rangeslider = list(type = "date"))) Line graph for monthly summarise view of the following variables: “Mean Temperature (°C)”, “Minimum Temperature (°C)”, “Maximum Temperature (°C)”
# Summarise period-filtered temperature data based on monthly average
summarised_temp_month <- filtered_period %>%
group_by_key() %>%
index_by(year_month = ~ yearmonth(.)) %>%
summarise(
avg_temp = round(mean(.data[[variable_temp]]),2), na.rm = TRUE)
# Filter summarised temperature data based on multiple selected stations
summarised_temp_mstn <- summarised_temp_month %>%
filter(Station %in% selected_stations) # Convert the year_week result to the first day of the week for plotting
summarised_temp_mstn <- summarised_temp_mstn %>%
mutate(month_start_date = floor_date(as.Date(year_month), unit = "month"))
# Generate the plot based on filtered data
plot_ly(summarised_temp_mstn, x = ~month_start_date, y = ~avg_temp,
type = 'scatter', mode = 'lines', color = ~Station,
hoverinfo = 'text',
text = ~paste("<b>Station:</b>", Station,
"<br><b>Month:</b>", year_month,
"<br><b>Average", variable_temp, ":</b>", avg_temp)) %>%
layout(title = paste("Monthly Average", variable_temp, "by Station"),
xaxis = list(title = ""),
yaxis = list(title = "")) %>%
# Display the plot
layout(xaxis = list(rangeslider = list(type = "date"))) Variable - Daily Rainfall Total (mm) Time resolution - week
# Summarise period-filtered rainfall data based on weekly total
summarised_rain_week <- filtered_period %>%
group_by_key() %>%
index_by(year_week = ~ yearweek(.)) %>%
summarise(
total_rainfall = sum(.data[[variable_rain]]), na.rm = TRUE)
# Filter summarised rainfall data based on multiple selected stations
summarised_rain_mstn <- summarised_rain_week %>%
filter(Station %in% selected_stations) # Convert the year_week result to the first day of the week for plotting
summarised_rain_mstn <- summarised_rain_mstn %>%
mutate(week_start_date = floor_date(as.Date(year_week), unit = "week"))
# Generate the plot based on filtered data
plot_ly(summarised_rain_mstn, x = ~week_start_date, y = ~total_rainfall,
type = 'scatter', mode = 'lines', color = ~Station,
hoverinfo = 'text',
text = ~paste("<b>Station:</b>", Station,
"<br><b>Week Starting:</b>", week_start_date,
"<br><b>Total rainfall :</b>", total_rainfall, "mm")) %>%
layout(title = paste("Weekly", variable_rain, "by Station"),
xaxis = list(title = ""),
yaxis = list(title = "")) %>%
# Display the plot
layout(xaxis = list(rangeslider = list(type = "date"))) Variable - Daily Rainfall Total (mm) Time resolution - Month
# Summarise period-filtered rainfall data based on weekly total
summarised_rain_month <- filtered_period %>%
group_by_key() %>%
index_by(year_month = ~ yearmonth(.)) %>%
summarise(
total_rainfall = sum(.data[[variable_rain]]), na.rm = TRUE)
# Filter summarised rainfall data based on multiple selected stations
summarised_rain_mstn <- summarised_rain_month %>%
filter(Station %in% selected_stations) # Convert the year_week result to the first day of the week for plotting
summarised_rain_mstn <- summarised_rain_mstn %>%
mutate(month_start_date = floor_date(as.Date(year_month), unit = "month"))
# Generate the plot based on filtered data
plot_ly(summarised_rain_mstn, x = ~~month_start_date, y = ~total_rainfall,
type = 'scatter', mode = 'lines', color = ~Station,
hoverinfo = 'text',
text = ~paste("<b>Station:</b>", Station,
"<br><b>Month:</b>", year_month,
"<br><b>Total rainfall :</b>", total_rainfall, "mm")) %>%
layout(title = paste("Monthly", variable_rain, "by Station"),
xaxis = list(title = ""),
yaxis = list(title = "")) %>%
# Display the plot
layout(xaxis = list(rangeslider = list(type = "date"))) 5 Section 2: Time Series Decomposition and auto correlation
This section introduces the STL method, a versatile and robust method that breaks down time series data into trend, seasonality, and remainder components.
STL is an acronym for “Seasonal and Trend decomposition using Loess”, while loess is a method for estimating nonlinear relationships. We will use STL to uncover deeper insights into our data, highlighting its importance in understanding and predicting trends.
Below are the sample plot of STL decomposition using different tuning parameter.
5.1 ACF & PACF
ACF(Autocorrelation function) measures the linear relationship between lagged values of a time series while PACF (Partial Autocorrelation Function) measures the correlation between observations with the effect of the intermediate observations removed. In this section, users can view the ACF and PACF to analyze the time-dependent characteristics of the selected time series data.
Users can choose a single station to generate the ACF and PACF plot. For the purpose of this exercise, we demonstrate using “Ang Mo Kio” station.
single_station <- "Ang Mo Kio"filtered_period_sstn <- filtered_period %>%
filter(Station %in% single_station)
ACF <- filtered_period_sstn %>%
ACF(filtered_period_sstn[[variable_temp]], lag_max = 100) %>%
autoplot() +
labs(title = paste("ACF plot of", variable_temp, "for", single_station)) +
theme_minimal()
ggplotly(ACF)summarised_temp_week_sstn <- summarised_temp_week %>%
filter(Station %in% single_station)
ACF <- summarised_temp_week_sstn %>%
ACF(avg_temp, lag_max = 50) %>%
autoplot() +
labs(title = paste("ACF plot of Weekly", variable_temp, "for", single_station)) +
theme_minimal()
ggplotly(ACF)filtered_period_sstn <- filtered_period %>%
filter(Station %in% single_station)
PACF <- filtered_period_sstn %>%
PACF(filtered_period_sstn[[variable_temp]], lag_max = 100) %>%
autoplot() +
labs(title = paste("PACF plot of", variable_temp, "for", single_station)) +
theme_minimal()
ggplotly(PACF)summarised_rain_week_sstn <- summarised_rain_week %>%
filter(Station %in% single_station)
PACF <- summarised_rain_week_sstn %>%
PACF(total_rainfall, lag_max = 50) %>%
autoplot() +
labs(title = paste("PACF plot of Weekly", variable_rain, "for", single_station)) +
theme_minimal()
ggplotly(PACF)5.2 STL Decomposition analysis
single_station <- "Ang Mo Kio"
filtered_period_sstn <- filtered_period %>%
filter(Station %in% single_station) stl_formula_default <- STL(`Mean Temperature (°C)`)
stl_formula_min <- STL(`Mean Temperature (°C)`
~ trend(window = 1) +
~ season(window = 1))
stl_formula_max <- STL(`Mean Temperature (°C)`
~ trend(window = 365) +
~ season(window = 365))stl_default <- filtered_period_sstn %>%
model(stl_formula_default) %>%
components() stl_default_tibble <- as_tibble(stl_default) %>%
select(Date, `Mean Temperature (°C)`, trend, season_week, season_year, remainder, season_adjust) %>%
mutate(trend = round(trend, 2),
season_adjust = round(season_adjust, 2),
season_week = round(season_week, 4),
season_year = round(season_year, 4),
remainder = round(remainder, 4))
datatable(stl_default_tibble,
class= "nowrap",
rownames = FALSE,
filter = 'top', # Enables filters at the top of each column
width="100%",
options = list(pageLength = 10, scrollX = TRUE))plot_stl_default <- stl_default %>%
autoplot()
ggplotly(plot_stl_default) %>% layout(width = 700, height = 700,
plot_bgcolor="#edf2f7")stl_min <- filtered_period_sstn %>%
model(stl_formula_min) %>%
components()stl_min_tibble <- as_tibble(stl_min) %>%
select(Date, `Mean Temperature (°C)`, trend, season_week, season_year, remainder, season_adjust) %>%
mutate(trend = round(trend, 2),
season_adjust = round(season_adjust, 2),
season_week = round(season_week, 4),
season_year = round(season_year, 4),
remainder = round(remainder, 4))
datatable(stl_min_tibble,
class= "nowrap",
rownames = FALSE,
filter = 'top', # Enables filters at the top of each column
width="100%",
options = list(pageLength = 10, scrollX = TRUE))plot_stl_min <- stl_min %>%
autoplot()
ggplotly(plot_stl_min) %>% layout(width = 700, height = 700,
plot_bgcolor="#edf2f7")stl_max <- filtered_period_sstn %>%
model(stl_formula_max) %>%
components()stl_max_tibble <- as_tibble(stl_max) %>%
select(Date, `Mean Temperature (°C)`, trend, season_week, season_year, remainder, season_adjust) %>%
mutate(trend = round(trend, 2),
season_adjust = round(season_adjust, 2),
season_week = round(season_week, 4),
season_year = round(season_year, 4),
remainder = round(remainder, 4))
datatable(stl_max_tibble,
class= "nowrap",
rownames = FALSE,
filter = 'top', # Enables filters at the top of each column
width="100%",
options = list(pageLength = 10, scrollX = TRUE))plot_stl_max <- stl_max %>%
autoplot()
ggplotly(plot_stl_max) %>% layout(width = 700, height = 700,
plot_bgcolor="#edf2f7")The STL decomposition plot consist of four panel. The bottom four panel shows breakdown of three components of STL, namely trend, seasonality, and remainder. These components can be added together to reconstruct the data shown in the top panel. The remainder component shown in the bottom panel is what is left over when the seasonal and trend-cycle components have been subtracted from the data.
5.2.1 STL test case
The below test case is run using the default STL formula.
stl_temp_week <- summarised_temp_week_sstn %>%
model(STL(avg_temp)) %>%
components() plot_stl_default <- stl_temp_week %>%
autoplot()
ggplotly(plot_stl_default) %>% layout(width = 700, height = 700,
plot_bgcolor="#edf2f7")stl_rain_week <- summarised_rain_week_sstn %>%
model(STL(total_rainfall)) %>%
components() plot_stl_default <- stl_rain_week %>%
autoplot()
ggplotly(plot_stl_default) %>% layout(width = 700, height = 700,
plot_bgcolor="#edf2f7")5.3 New plot to consider (ignore this part)
This plot do not show the correct label.
gg_season_plot <- filtered_period_sstn %>%
gg_season(`Mean Temperature (°C)`) +
theme_minimal()gg_season_plot
ggplotly(gg_season_plot)gg_tsdisplay(filtered_period_sstn, `Mean Temperature (°C)`, plot_type = "auto")
The below plot is not appropriate, since we only have 3 years of data.
# Summarise period-filtered temperature data based on weekly average
summarised_temp <- filtered_period %>%
group_by_key() %>%
index_by(year_month = ~ yearmonth(.)) %>%
summarise(
avg_temp = round(mean(.data[[variable_temp]]),2), na.rm = TRUE)
# Filter summarised temperature data based on multiple selected stations
summarised_temp_sstn <- summarised_temp %>%
filter(Station %in% single_station) summarised_temp_sstn# A tsibble: 36 x 4 [1M]
# Key: Station [1]
Station year_month avg_temp na.rm
<chr> <mth> <dbl> <lgl>
1 Ang Mo Kio 2021 Jan 23.8 TRUE
2 Ang Mo Kio 2021 Feb 24.4 TRUE
3 Ang Mo Kio 2021 Mar 24.7 TRUE
4 Ang Mo Kio 2021 Apr 24.9 TRUE
5 Ang Mo Kio 2021 May 25.9 TRUE
6 Ang Mo Kio 2021 Jun 25.3 TRUE
7 Ang Mo Kio 2021 Jul 26.1 TRUE
8 Ang Mo Kio 2021 Aug 24.7 TRUE
9 Ang Mo Kio 2021 Sep 25.2 TRUE
10 Ang Mo Kio 2021 Oct 25.8 TRUE
# ℹ 26 more rows
summarised_temp_sstn %>% gg_subseries(avg_temp)
6 Section 3: Time Series Forecasting [daily]
6.1 Split data into training and testing
# Define the split point; for example, keeping the first 80% of rows for training
split_point <- nrow(filtered_period_sstn) * 0.8
# Create the training dataset (first 80% of the data)
train_daily <- filtered_period_sstn %>%
slice(1:floor(split_point))
# Create the test dataset (remaining 20% of the data)
test_daily <- filtered_period_sstn %>%
slice((floor(split_point) + 1):n())6.2 Create and fit multiple model to tesing set
We observed that our data exhibits seasonal patterns, and hence we select the models specifically designed to handle such seasonal variations.
train_daily_fit <- train_daily %>%
model(
# naïve forecast of the seasonally adjusted data
STLNaive = decomposition_model(stl_formula_default, NAIVE(season_adjust)),
# auto arima forecast of the seasonally adjusted data
STLArima = decomposition_model(stl_formula_default, ARIMA(season_adjust)),
# Exponential Smoothing forecast of the seasonally adjusted data
STLETS = decomposition_model(stl_formula_default, ETS(season_adjust ~ season("N"))),
# AUTO arima
AUTOARIMA = ARIMA(`Mean Temperature (°C)`),
# AUTO prophet
AUTOprophet = prophet(`Mean Temperature (°C)`),
# Auto Exponential smoothing
AUTOETS = ETS(`Mean Temperature (°C)`)
)
forecast_horizon <- nrow(test_daily)
# Forecasting
train_daily_fc <- forecast(train_daily_fit, h = forecast_horizon)
# Plotting the forecasts
c <- autoplot(train_daily, `Mean Temperature (°C)`) +
autolayer(test_daily, `Mean Temperature (°C)`, series = "Test Data") +
autolayer(train_daily_fc, level = NULL) +
labs(title = "Forecast Validation for daily mean temperature of Ang Mo Kio Station") +
theme_minimal()
ggplotly(c, tooltip = c("x", "y", ".model"))6.3 Testing set forcast & Accuracy Evaluation
accuracy_metrics <- accuracy(train_daily_fc, test_daily) %>%
arrange(.model) %>%
select(.model, .type, RMSE, MAE, MAPE, MASE) %>%
mutate(across(c(RMSE, MAE, MAPE, MASE), round, 2))
datatable(accuracy_metrics,
class= "hover",
rownames = FALSE,
width="100%",
options = list(pageLength = 10,scrollX=T))6.3.1 Residual plot
residual <- train_daily %>%
model(
# naïve forecast of the seasonally adjusted data
STLNaive = decomposition_model(stl_formula_default, NAIVE(season_adjust)),
# auto arima forecast of the seasonally adjusted data
STLArima = decomposition_model(stl_formula_default, ARIMA(season_adjust)),
# Exponential Smoothing forecast of the seasonally adjusted data
STLETS = decomposition_model(stl_formula_default, ETS(season_adjust ~ season("N"))),
# AUTO arima
AUTOARIMA = ARIMA(`Mean Temperature (°C)`),
# AUTO prophet
AUTOprophet = prophet(`Mean Temperature (°C)`),
# Auto Exponential smoothing
AUTOETS = ETS(`Mean Temperature (°C)`)
) %>%
augment()
a <- autoplot(residual, .innov) +
labs(title = "Residual Plot") +
theme_minimal()
ggplotly(a)6.4 Refit to Full Dataset & Forecast Forward
# Refit models to the full dataset
full_fit <- filtered_period_sstn %>%
model(
# naïve forecast of the seasonally adjusted data
STLNaive = decomposition_model(stl_formula_default, NAIVE(season_adjust)),
# auto arima forecast of the seasonally adjusted data
STLArima = decomposition_model(stl_formula_default, ARIMA(season_adjust)),
# Exponential Smoothing forecast of the seasonally adjusted data
STLETS = decomposition_model(stl_formula_default, ETS(season_adjust ~ season("N"))),
# AUTO arima
AUTOARIMA = ARIMA(`Mean Temperature (°C)`),
# AUTO prophet
AUTOprophet = prophet(`Mean Temperature (°C)`),
# Auto Exponential smoothing
AUTOETS = ETS(`Mean Temperature (°C)`)
)future_horizon <- "1 month" # Adjust this to your forecast needs
full_forecast <- forecast(full_fit, h = future_horizon)full_forecast_df <- as_tibble(full_forecast)
full_forecast_df <- full_forecast_df %>%
mutate(.mean = round(.mean, 2)) %>%
mutate(n=n(), sd=sd(.mean)) %>%
mutate(se=sd/sqrt(n-1)) %>%
mutate(Lower = round(.mean - (1.96 * se), 2),
Upper = round(.mean + (1.96 * se), 2))# Initialize an empty plotly object
p <- plot_ly()
# Unique models for iteration and color assignment
unique_models <- unique(full_forecast_df$.model)
colors <- RColorBrewer::brewer.pal(n = length(unique_models), name = "Set1")
# Loop through each model to add to the plot
for (i in seq_along(unique_models)) {
model_name <- unique_models[i]
# Filter data for the current model
model_data <- filter(full_forecast_df, .model == model_name)
# Define the custom hovertemplate for lines
hovertemplate_line <- paste(
"Date: %{x}<br>",
"Mean Temperature (°C): %{y:.2f}<br>",
"Model: ", model_name, "<br>",
"95% CI: [%{customdata[0]:.2f}, %{customdata[1]:.2f}]<extra></extra>"
)
# Customdata for the line (lower and upper CI values)
custom_data <- mapply(function(lower, upper) list(lower, upper), model_data$Lower, model_data$Upper, SIMPLIFY = FALSE)
# Add forecast line with custom tooltip and legend grouping
p <- add_lines(p, data = model_data, x = ~Date, y = ~`.mean`, name = model_name,
line = list(color = colors[i]), hovertemplate = hovertemplate_line,
customdata = custom_data, legendgroup = model_name)
# Add confidence interval ribbon with legend grouping, but no separate legend entry
p <- add_ribbons(p, data = model_data, x = ~Date, ymin = ~Lower, ymax = ~Upper,
fillcolor = scales::alpha(colors[i], 0.2), line = list(color = "transparent"),
legendgroup = model_name, showlegend = FALSE, hoverinfo = "skip")
}
# Customize layout
p <- p %>% layout(title = "Future Forecast Plot with 95% CI for daily mean temperature of Ang Mo Kio Station",
xaxis = list(title = "Date"),
yaxis = list(title = "Mean Temperature (°C)"),
legend = list(title = list(text = 'Model')),
hovermode = 'closest')
# Display the plot
p# Table view
col<- full_forecast_df %>%
select(.model, Date, .mean) %>%
rename(Forecast = .mean) %>%
mutate(Forecast = round(Forecast, 2),
.model = as.factor(.model))
datatable(col,
class= "hover",
rownames = FALSE,
filter = 'top', # Enables filters at the top of each column
width="100%",
options = list(pageLength = 6, scrollX = TRUE))7 Section 3.1: Time Series Forecasting [Temperature-weekly average]
7.1 Split data into training and testing
# Define the split point; for example, keeping the first 80% of rows for training
split_point <- nrow(summarised_temp_week_sstn) * 0.8
# Create the training dataset (first 80% of the data)
train_temp_week <- summarised_temp_week_sstn %>%
slice(1:floor(split_point))
# Create the test dataset (remaining 20% of the data)
test_temp_week <- summarised_temp_week_sstn %>%
slice((floor(split_point) + 1):n())7.2 Create and fit multiple model to tesing set
We observed that our data exhibits seasonal patterns, and hence we select the models specifically designed to handle such seasonal variations.
train_temp_week_fit <- train_temp_week %>%
model(
# naïve forecast of the seasonally adjusted data
STLNaive = decomposition_model(STL(avg_temp), NAIVE(season_adjust)),
# auto arima forecast of the seasonally adjusted data
STLArima = decomposition_model(STL(avg_temp), ARIMA(season_adjust)),
# Exponential Smoothing forecast of the seasonally adjusted data
STLETS = decomposition_model(STL(avg_temp), ETS(season_adjust ~ season("N"))),
# AUTO arima
AUTOARIMA = ARIMA(avg_temp),
# AUTO prophet
AUTOprophet = prophet(avg_temp),
# Auto Exponential smoothing
AUTOETS = ETS(avg_temp)
)
forecast_horizon <- nrow(test_temp_week)
# Forecasting
train_temp_week_fc <- forecast(train_temp_week_fit, h = forecast_horizon)# Plotting the forecasts
c <- autoplot(train_temp_week, avg_temp) +
autolayer(test_temp_week, avg_temp, series = "Test Data") +
autolayer(train_temp_week_fc, level = NULL) +
labs(title = "Forecast Validation for weekly mean temperature of Ang Mo Kio Station") +
theme_minimal()
ggplotly(c, tooltip = c("x", "y", ".model"))7.3 Testing set forcast & Accuracy Evaluation
accuracy_metrics <- accuracy(train_temp_week_fc, test_temp_week) %>%
arrange(.model) %>%
select(.model, .type, RMSE, MAE, MAPE, MASE) %>%
mutate(across(c(RMSE, MAE, MAPE, MASE), round, 2))
datatable(accuracy_metrics,
class= "hover",
rownames = FALSE,
width="100%",
options = list(pageLength = 10,scrollX=T))7.3.1 Residual plot
residual <- train_temp_week %>%
model(
# naïve forecast of the seasonally adjusted data
STLNaive = decomposition_model(STL(avg_temp), NAIVE(season_adjust)),
# auto arima forecast of the seasonally adjusted data
STLArima = decomposition_model(STL(avg_temp), ARIMA(season_adjust)),
# Exponential Smoothing forecast of the seasonally adjusted data
STLETS = decomposition_model(STL(avg_temp), ETS(season_adjust ~ season("N"))),
# AUTO arima
AUTOARIMA = ARIMA(avg_temp),
# AUTO prophet
AUTOprophet = prophet(avg_temp),
# Auto Exponential smoothing
AUTOETS = ETS(avg_temp)
)%>%
augment()
a <- autoplot(residual, .innov) +
labs(title = "Residual Plot") +
theme_minimal()
ggplotly(a)7.4 Refit to Full Dataset & Forecast Forward
# Refit models to the full dataset
full_temp_week_fit <- summarised_temp_week_sstn %>%
model(
# naïve forecast of the seasonally adjusted data
STLNaive = decomposition_model(STL(avg_temp), NAIVE(season_adjust)),
# auto arima forecast of the seasonally adjusted data
STLArima = decomposition_model(STL(avg_temp), ARIMA(season_adjust)),
# Exponential Smoothing forecast of the seasonally adjusted data
STLETS = decomposition_model(STL(avg_temp), ETS(season_adjust ~ season("N"))),
# AUTO arima
AUTOARIMA = ARIMA(avg_temp),
# AUTO prophet
AUTOprophet = prophet(avg_temp),
# Auto Exponential smoothing
AUTOETS = ETS(avg_temp)
)# Plotting the forecasts along with the full dataset
future_horizon <- "10 week" # Adjust this to your forecast needs
full_temp_week_forecast <- forecast(full_temp_week_fit, h = future_horizon)e <- autoplot(full_temp_week_forecast, level = 95) +
labs(title = "Future Forecast Plot for weekly mean temperature of Ang Mo Kio Station") +
theme_minimal()
ggplotly(e, tooltip = c("x", "y",".model"))full_temp_week_forecast_tibble <- as_tibble(full_temp_week_forecast)
col <- full_temp_week_forecast_tibble %>%
mutate(year_week = paste(year(year_week), "w",
sprintf("%02d", isoweek(year_week)), sep="")) %>%
select(.model, year_week, .mean) %>%
rename(Forecast = .mean) %>%
mutate(Forecast = round(Forecast, 2),
.model = as.factor(.model),
year_week = as.factor(year_week))
datatable(col,
class= "hover",
rownames = FALSE,
filter = 'top',
width="100%",
options = list(pageLength = 6, scrollX = TRUE))8 Section 3.2: Time Series Forecasting [Rainfall-weekly total]
8.1 Split data into training and testing
# Define the split point; for example, keeping the first 80% of rows for training
split_point <- nrow(summarised_rain_week_sstn) * 0.8
# Create the training dataset (first 80% of the data)
train_rain_week <- summarised_rain_week_sstn %>%
slice(1:floor(split_point))
# Create the test dataset (remaining 20% of the data)
test_rain_week <- summarised_rain_week_sstn %>%
slice((floor(split_point) + 1):n())8.2 Create and fit multiple model to tesing set
We observed that our data exhibits seasonal patterns, and hence we select the models specifically designed to handle such seasonal variations.
train_rain_week_fit <- train_rain_week %>%
model(
# naïve forecast of the seasonally adjusted data
STLNaive = decomposition_model(STL(total_rainfall), NAIVE(season_adjust)),
# auto arima forecast of the seasonally adjusted data
STLArima = decomposition_model(STL(total_rainfall), ARIMA(season_adjust)),
# Exponential Smoothing forecast of the seasonally adjusted data
STLETS = decomposition_model(STL(total_rainfall), ETS(season_adjust ~ season("N"))),
# AUTO arima
AUTOARIMA = ARIMA(total_rainfall),
# AUTO prophet
AUTOprophet = prophet(total_rainfall),
# Auto Exponential smoothing
AUTOETS = ETS(total_rainfall)
)
forecast_horizon <- nrow(test_rain_week)
# Forecasting
train_rain_week_fc <- forecast(train_rain_week_fit, h = forecast_horizon)# Plotting the forecasts
c <- autoplot(train_rain_week, total_rainfall) +
autolayer(test_rain_week, total_rainfall, series = "Test Data") +
autolayer(train_rain_week_fc, level = NULL) +
labs(title = "Forecast Validation for weekly total rainfall of Ang Mo Kio Station") +
theme_minimal()
ggplotly(c, tooltip = c("x", "y", ".model"))8.3 Testing set forcast & Accuracy Evaluation
accuracy_metrics <- accuracy(train_rain_week_fc, test_rain_week) %>%
arrange(.model) %>%
select(.model, .type, RMSE, MAE, MAPE, MASE) %>%
mutate(across(c(RMSE, MAE, MAPE, MASE), round, 2))
datatable(accuracy_metrics,
class= "hover",
rownames = FALSE,
width="100%",
options = list(pageLength = 10,scrollX=T))8.3.1 Residual plot
residual <- train_rain_week %>%
model(
# naïve forecast of the seasonally adjusted data
STLNaive = decomposition_model(STL(total_rainfall), NAIVE(season_adjust)),
# auto arima forecast of the seasonally adjusted data
STLArima = decomposition_model(STL(total_rainfall), ARIMA(season_adjust)),
# Exponential Smoothing forecast of the seasonally adjusted data
STLETS = decomposition_model(STL(total_rainfall), ETS(season_adjust ~ season("N"))),
# AUTO arima
AUTOARIMA = ARIMA(total_rainfall),
# AUTO prophet
AUTOprophet = prophet(total_rainfall),
# Auto Exponential smoothing
AUTOETS = ETS(total_rainfall)
)%>%
augment()
a <- autoplot(residual, .innov) +
labs(title = "Residual Plot") +
theme_minimal()
ggplotly(a)8.4 Refit to Full Dataset & Forecast Forward
# Refit models to the full dataset
full_rain_week_fit <- summarised_rain_week_sstn %>%
model(
# naïve forecast of the seasonally adjusted data
STLNaive = decomposition_model(STL(total_rainfall), NAIVE(season_adjust)),
# auto arima forecast of the seasonally adjusted data
STLArima = decomposition_model(STL(total_rainfall), ARIMA(season_adjust)),
# Exponential Smoothing forecast of the seasonally adjusted data
STLETS = decomposition_model(STL(total_rainfall), ETS(season_adjust ~ season("N"))),
# AUTO arima
AUTOARIMA = ARIMA(total_rainfall),
# AUTO prophet
AUTOprophet = prophet(total_rainfall),
# Auto Exponential smoothing
AUTOETS = ETS(total_rainfall)
)# Plotting the forecasts along with the full dataset
future_horizon <- "10 week" # Adjust this to your forecast needs
full_rain_week_forecast <- forecast(full_rain_week_fit, h = future_horizon)e <- autoplot(full_rain_week_forecast, level = 95) +
labs(title = "Future Forecast Plot for weekly total rainfall of Ang Mo Kio Station") +
theme_minimal()
ggplotly(e, tooltip = c("x", "y",".model"))full_rain_week_forecast_tibble <- as_tibble(full_rain_week_forecast)
col <- full_rain_week_forecast_tibble %>%
mutate(year_week = paste(year(year_week), "w",
sprintf("%02d", isoweek(year_week)), sep="")) %>%
select(.model, year_week, .mean) %>%
rename(Forecast = .mean) %>%
mutate(Forecast = round(Forecast, 2),
.model = as.factor(.model),
year_week = as.factor(year_week))
datatable(col,
class= "hover",
rownames = FALSE,
filter = 'top',
width="100%",
options = list(pageLength = 6, scrollX = TRUE))9 Section 3.3: Time Series Forecasting [daily]
9.1 Min STL
stl_formula_min <- STL(`Mean Temperature (°C)`
~ trend(window = 1) +
~ season(window = 1))residual <- train_daily %>%
model(
# naïve forecast of the seasonally adjusted data
STLNaive = decomposition_model(stl_formula_min, NAIVE(season_adjust)),
# auto arima forecast of the seasonally adjusted data
STLArima = decomposition_model(stl_formula_min, ARIMA(season_adjust)),
# Exponential Smoothing forecast of the seasonally adjusted data
STLETS = decomposition_model(stl_formula_min, ETS(season_adjust ~ season("N"))),
# AUTO arima
AUTOARIMA = ARIMA(`Mean Temperature (°C)`),
# AUTO prophet
AUTOprophet = prophet(`Mean Temperature (°C)`),
# Auto Exponential smoothing
AUTOETS = ETS(`Mean Temperature (°C)`)
) %>%
augment()
a <- autoplot(residual, .innov) +
theme_minimal()
ggplotly(a)9.2 Max STL
stl_formula_max <- STL(`Mean Temperature (°C)`
~ trend(window = 365) +
~ season(window = 365))residual <- train_daily %>%
model(
# naïve forecast of the seasonally adjusted data
STLNaive = decomposition_model(stl_formula_max, NAIVE(season_adjust)),
# auto arima forecast of the seasonally adjusted data
STLArima = decomposition_model(stl_formula_max, ARIMA(season_adjust)),
# Exponential Smoothing forecast of the seasonally adjusted data
STLETS = decomposition_model(stl_formula_max, ETS(season_adjust ~ season("N"))),
# AUTO arima
AUTOARIMA = ARIMA(`Mean Temperature (°C)`),
# AUTO prophet
AUTOprophet = prophet(`Mean Temperature (°C)`),
# Auto Exponential smoothing
AUTOETS = ETS(`Mean Temperature (°C)`)
) %>%
augment()
a <- autoplot(residual, .innov) +
theme_minimal()
ggplotly(a)10 UI design
10.1 Section 1

10.2 Section 2

10.3 Section 3

11 Reference
Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3.